A Causal Classification of Orthography Errors in Web Texts
نویسنده
چکیده
Errors, even at the spelling level, can provide useful insight into the nature of a written text. This paper presents a classification of spelling errors in Web texts based on their causes (misspellings, typos and intentional deviations), linking them to the attitudes of their authors and the circumstances of their writing. Examples are drawn from blog and forum entries in English and Italian.
منابع مشابه
An Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification
Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...
متن کاملAutomatic Identification of Learners' Language Background Based on Their Writing in Czech
The goal of this study is to investigate whether learners’ written data in highly inflectional Czech can suggest a consistent set of clues for automatic identification of the learners’ L1 background. For our experiments, we use texts written by learners of Czech, which have been automatically and manually annotated for errors. We define two classes of learners: speakers of Indo-European languag...
متن کاملAre Blogs Edited? A Linguistic Survey of Italian Blogs Using Search Engines
Many blogs are written by people with no formal training in public writing; this could suggest a low level of editing and general correctness. A quantitative analysis of misspellings, however, shows that in their orthography Italian blogs are as well revised as conventional Italian newspaper texts. On the other hand, their editing is more careful than the editing of the average of Italian web p...
متن کاملDesign and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words
This paper presents a new method for analyzing words in the Persian language context to find orthographical and structural errors regardless of the meaning. This technique tokenizes each word in a statement then tries to detect the kind of word, and analyses its correctness in terms of orthography and morphology by means of a lexicon. It should be noted that some words in the Persian language h...
متن کاملMHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs
In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006